Create SolidQueue worker ECS task#1592
Merged
Merged
Conversation
cadmiumcat
previously approved these changes
Jun 3, 2025
Contributor
cadmiumcat
left a comment
There was a problem hiding this comment.
🙌 Super clear PR
And it looks sensible. I agree that we should start with a count of 3 for now and see how we go/enable autoscaling when we're ready.
My only (non-blocking) question was whether we'd want to set the desired count to a different number in each environment?
Contributor
Author
I think this is a good question and a good suggestion. I've just added a commit to make the |
1e1c42f to
9ce2839
Compare
* Create ECS service, ECS task definition, security group, and relevant security group rules. I used a pattern I noticed in mailchimp sync where I took the exported ECS task container definition and override some parts of it
* Provide the worker task access to a subset of secrets (I checked with the devs and it will only need access to these secrets, the worker won't need access to NOTIFY_API_KEY or SUBMISSION_STATUS_API)
* Create a specific IAM task execution role for the queue worker with permissions to read relevant Parameter Store values (again, I don't think the queue worker needs access to all the secrets forms-runner has access to). * This code somewhat duplicates what's in `ecs-service/iam.tf` and the policies declared in `forms-runner/main.tf` but I think the duplication is simpler than forcing the ECS task into the ecs-service module format. * The worker will use forms-runner task role (`module.ecs_service.task_definition.task_role_arn`) to ensure it has access to necessary forms-runner related resources (I believe this is required because the worker errors if I use a custom role is since it requires access to the submission_email_ses_bounces_and_complaints SQS queue).
* Make the `desired_count` have a minimum capacity for each environment (for production this is 6, for all other environments it's 1). * The capacity is set relatively low for environments (except production) because so far the demand on the queue hasn't been high (queue length rarely exceeds 1) -- Cat raised the good point that we don't want to over-provision
9ce2839 to
526b2d8
Compare
* Add SSM parameter created from PR #1591 * I rebased PR #1591 on to this branch in the GitHub UI (not a good idea), and it absorbed the queue_worker.tf file from PR #1591 as a separate file instead of merging the files together. * This happened because I didn't take into account the lack of shared file history from git's perspective. * The queue_worker.tf on the `create-solidqueue-worker-ecs-task` branch was never branched off from the queue_worker.tf in main (i.e., it was created independently rather than modified from the original). Therefore, Git doesn’t know they’re related. It treats them as two different files that just happen to have the same name.
cadmiumcat
approved these changes
Jun 5, 2025
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What problem does this pull request solve?
Trello card: https://trello.com/c/f7js82Cm/635-run-solid-queue-as-its-own-ecs-task
Depends on this PR being merged first: #1591
This pull request creates new ECS task, service, and relevant configuration to run Solid Queue as a separate worker task in our ECS cluster. It contains the same ENV variables as the
forms-runnerweb app (including a specific worker Sentry DSN).Where possible I reused code, for example the
task_container_definition, with a few overrides (similar to themailchimp-syncapproach).The new ECS task should have egress access to the VPC, RDS, and the internet, but no ingress access (which happens by default unless otherwise specified in Terraform). The new ECS task should also exist in the same VPC and private subnet group as the other
forms-runnertasks/services.I used the same security group rules configured in
modules/ecs-service/security-groups.tfsince they should match the existingforms-runnerECS task with the exception of the restricted ingress access.I've added a HealthCheck which depends on a PR in forms-runner (some Rails code to create the
healthcheckfile, see here).I can also confirm that logs from the new ECS task are shipped to Splunk, use query
index="gds_dsp_dev_forms" log_stream="forms-runner-dev-queue-worker*"to view.I've tested the changes in
dev(and locally) and have attached some screen shots:Note on autoscaling
After conversations with Andy and Sean, we agreed to hold off on autoscaling for now. We want to see how the worker task handles existing load and we can adjust as part of a follow up.
Happy to talk through this decision and would love to hear thoughts on the
desired_count = 3.For context, the queue length hasn't exceeded
1in production in the last three months.Checklist
bin/jobsThings to consider when reviewing